worker: fix premature addon unload#63575
Conversation
|
Review requested:
|
4286ff7 to
b921cc6
Compare
912017b to
6144e5a
Compare
Weak callbacks could run after addons were unloaded, leading to a crash. Signed-off-by: Mohamed Akram <mohd.akram@outlook.com>
6144e5a to
50023d9
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #63575 +/- ##
========================================
Coverage 90.33% 90.33%
========================================
Files 730 730
Lines 234362 234464 +102
Branches 43908 43909 +1
========================================
+ Hits 211708 211810 +102
- Misses 14376 14378 +2
+ Partials 8278 8276 -2
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
This seems like a valid bug, but it doesn't seem like an issue in the workers implementation, and rather one in node::ObjectWrap; the class should add an explicit cleanup hook via AddEnvironmentCleanupHook() along its weak callback.
node::ObjectWrap in general has been unmaintained and really should have been deprecated a long time ago, but yeah, that's the place to fix this.
|
@addaleax But this problem doesn't happen in the main thread. I had first thought about solving it in ObjectWrap as well, but that will not fix the case where addons use WeakCallback directly, and potentially other situations where a callback lingers. Also, as you mentioned, the ObjectWrap code has not been modified for many years and I would be hesitant to make any change there lest it break some other thing. It is exceptionally difficult to debug this type of bug because the code is gone so it doesn't appear in the stack trace. Per the comment that's there in the code, this was disabled for the main thread presumably because of certain issues, and the same ones will ostensibly happen if an add-on is only loaded in the worker thread, so IMO it would be good to align the main and worker thread behaviors. It also makes sense from a pure conceptual perspective that unloading should happen after all code has stopped running, and I'm not sure if there's some benefit for doing it earlier. |
That's part of the problem – this isn't specific to
It's disabled in the main thread because legacy addons that are not aware of Environment cleanup hooks are expected to keep working as they are, that's all. This is also why we call out in our docs that addons need to explicitly opt into supporting worker threads: https://nodejs.org/api/addons.html#worker-support
That's not a good reason to keep it buggy tbh.
At the very least, there needs to be something that ensures that embedder-created |
There's an argument for not dlclosing at all. It is allowed to be a no-op after all. My understanding is that opening and closing libraries several times can cause issues. I could not find the original motivation for adding the close. On the one hand the dlopen happens inside Node.js, but in reality it's opened for the whole process, so it's not strictly tied to the Environment. Even if one were to add an environment cleanup hook, what if you need to do some asynchronous cleanup?
In the current behavior, without this PR, would that close code run if you created the embedder instance in the main thread? |
It is a no-op under specific circumstances, with legacy addons that have been written for older versions of Node.js that did not have cleanup logic in place.
It can expose bugs in libraries, yes, but it's not inherently problematic.
Yeah, but the concept of ownership doesn't stop applying because it interacts with process-global state.
Well ... you use the async environment cleanup hook version, right?
Yeah, I have to correct myself here – currently we don't expose the |
I meant on the OS level,
Yes, but shouldn't the process be the owner?
In that case, would it make sense to now align the worker thread behavior with the main thread? At least until this is figured out. |
... no? That would be like saying that file descriptors are owned by processes and shouldn't be cleaned up by whichever entity opened them.
In an ideal world, we'd have aligned these behaviors, yes – by also performing resource cleanup on the main thread. That hasn't been practical at any point so far, though.
I really don't think there's much that needs to be figured out here. If you'd prefer not to fix this bug where it occurs, I'm happy to open a PR myself (and, with your permission of course, include the test case from here). |
Perhaps I worded that poorly. To match this analogy, the Node.js script is the one calling the open to receive a descriptor from the runtime, but the runtime is closing it (prematurely) while the user's program is still using it. With an OS file descriptor, if you open it but don't close it (as in the case of a Node.js script, which does not call close), then the OS will free that resource only once no references to it remain or the process terminates. This is not what is happening currently. The environment is not the one opening the addon, this is what is causing the dissonance for me, and that would explain my approach to the fix.
Please go ahead. Thank you for the feedback. It might be good to also re-open the linked issue until the fix has landed. |
Weak callbacks could run after addons were unloaded, leading to a crash.
Fixes #63540